Router

ABSTRACT

Configuration of reconfigurable multidimensional fields are described. Information is provided for handling feedback, among other things.

FIELD OF THE INVENTION

The present invention relates to configurable modules and the like, inparticular the management of data streams therein, in particular withthe placement of resources and routing of connections between cells,etc.

BACKGROUND INFORMATION

Multidimensional fields of data processing cells are already known. Thegeneric class of these modules includes in particular systolic arrays,neural networks, multiprocessor systems, processors having a pluralityof arithmetic units and/or logic cells and/or communicative/peripheralcells (IO), interconnection and network modules such as crossbarswitches as well as known modules of the generic types FPGA, DPGA,Chameleon, XPUTER, etc. In particular there are known modules in whichfirst cells are reconfigurable during run time without interfering withthe operation of other cells (see, for example, German Patent No. 44 16881, German Patent Application Nos. DE 197 81 412.3, DE 197 81 483.2, DE196 54 846.2-53, DE 196 54 593.5-53, DE 197 04 044.6-53, DE 198 80129.7, DE 198 61 088.2-53, and DE 199 80 312.9, InternationalApplication No. PCT/DE 00/01869, German Patent Application Nos. DE 10036 627.9-33, DE 100 28 397.7, DE 101 10 530.4, and DE 101 11 014.6,International Application No. PCT/EP 00/10516, and European ApplicationNo. EP 01 102 674.7). These are herewith incorporated fully into thepresent text for disclosure purposes. Reference is also made to theChameleon system processor architecture. However, the usability of thestructure mentioned last for data processing purposes is more comparableto an arrangement described in German Patent Application No. DE 101 03624.

The data processing cells of these modules may now execute differentfunctions such as Boolean and/or arithmetic operations on inputoperands. Connections running between the cells are also adjustable andtypically include buses capable of interconnecting in various ways andthus creating a multidimensional field whose interconnection isadjustable. The cells exchange information such as status signals,triggers or the data to be processed over the buses or other lines. Thecells are typically arranged in rows and columns in a two-dimensionalprocessor field, with the outputs of cells of a first row beingconnected to buses to which the inputs of cells of the next row are alsoto be connected. In a conventional design (Pact XPP), forward andbackward registers are also provided for carrying data while bypassingcells on bus systems of other rows, achieving a balance of branches tobe executed simultaneously, etc. There have already been proposals forproviding such forward and/or backward registers with a functionalitythat goes beyond pure data transfer.

In general, however, it is necessary to define which cell performs whichdata processing steps, where this cell is situated and how it isconnected. In the related art, strategies for automatic control ofplacement mechanisms and routing mechanisms are already known.

Placers, for example, typically operate according to a force method,which uses forces between cells for optimum placement of dependent cellsby simulating connections by springs in a physical model. This usuallyyields a mostly suitable placement result.

In addition, German Patent No. 44 16 881, and German Patent ApplicationNos. DE 196 54 846.2-53 and DE 102 06 653.1 describe data processingmethods for reconfigurable modules in which data is read out of one ormore memories in each processing step and is then processed and writtento one or more memories. According to the related art, the read andwrite memories are placed differently and are typically placed inopposition (Figures xxua, xxub, xxuc and German Patent Application No.DE 102 06 653.1, FIG. 3).

Special reconfiguration methods (wave reconfiguration) are alsodescribed in German Patent Application Nos. DE 197 04 728.9, DE 199 26538.0, DE 100 28 397.7 for the aforementioned modules, thus permittingparticularly efficient reconfiguration by jointly transmitting thereconfiguration information together with the last data to be processedvia the data buses and/or trigger buses, and by reconfiguring the busesand cells immediately after successful processing.

To perform a certain type of data processing, each cell must be assigneda certain function and at the same time a suitable position in space andinterconnection must be provided. Therefore, before the multidimensionalprocessor field processes data as desired, it is necessary to ascertainwhich cell is to execute which function; a function must be defined foreach cell involved in a data processing task, and the interconnectionmust be determined.

SUMMARY

An object of the present invention is to provide a novel embodiment forcommercial use.

First, a method for creating configurations for multidimensional fieldsof reconfigurable cells for implementing given applications in which anapplication is broken down into individual modules and the elementsnecessary for performing this method are placed module by module. Such abreakdown into modules is advantageous, because then configurations maybe determined more easily for these modules.

It may be particularly preferable if stationary elements are provided inat least one module in the method and these stationary elements areprovided at predetermined locations and the non-fixed elements aresubsequently placed. It is then possible to place modules among theindividual mobile and/or immobile objects by minimizing assigned virtualforces.

Generally, it may also be desirable to arrange the function andinterconnection in such a way that data processing may be performed aspromptly as possible and with the best possible use of resources.Frequently however, e.g., due to hardware restrictions, it is impossibleto find an arrangement that will ensure the desired data transfer in anoptimum manner. Suboptimal arrangements must then be used.

It is now further proposed according to the present invention that, toimprove the configuration for multidimensional fields of reconfigurablyinterconnected data processing cells, the required connections betweenthe cells be prioritized, with connections having a high priority beingestablished first and other connections being established subsequently.

This minimizes the use of suboptimal configurations due to the fact thatit ensures that data having fewer restrictions due to a shortage ofresources, such as a limited number of buses, etc., may stream overconnections that are particularly important, e.g., due to a requiredhigh low latency time, etc.

It is therefore also preferably possible for connections to beprioritized, taking into account in particular an allowable delay indata processing. Prioritization may be performed by taking into accountthe maximum allowed delay and/or delay ratios of different connections.Delay ratios to be taken into account in prioritization preferablyinclude a delay of “0,” “longer than,” “longer than or equal to” and“equal to.”

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 a and 1 b show PAE cells of an XPP architecture flanked byforward and backward registers.

FIG. 2 shows how data from an output port converges at a node and howthis may happen at an input of a cell.

FIG. 3 shows how data converges at a node.

FIG. 4 shows how data converges at a node.

Figures xxua-xxuc show the direction of data flow of each successiveconfiguration changes.

Figures xxva-xxvc an example of a plurality of arrangements for tworeconfiguration cycles.

Figures xxwa-xxwc show corresponding arrangements from several sides ofthe array may also be used at the same time; two correspondingreconfiguration cycles are illustrated.

Figure xxsa shows that after each processing of a last data word, thenext following configuration may be set immediately; only afterreconfiguration of all the cells and buses involved is it possible tobegin with the next data processing.

Figure xxsb shows largely maintaining the direction of flow between thecells and merely exchanging the bus systems of the memories.

Figure xxta shows memories for reading data and writing data situatedclose together.

Figure xxtb shows that in performing a reconfiguration, only the bussystems between the read/write memories are exchanged.

Figure xxxa shows the introduction of registers into the long feedbackbuses at regular intervals.

Figure xxxb shows all the cells of a loop arranged as locally aspossible around a loop head; the loop foot is placed as close to theloop head as possible.

Figure xxxc shows a helical arrangement.

Figure xxxd shows all the cells of a loop arranged as locally aspossible around a loop head; the loop foot is placed as close to theloop head as possible.

Figure xxxe shows a wave-shaped pattern.

Figure xxxf shows a long feedback bus.

Figure xxxg shows a coil.

Figure xxxh shows rolling out a loop in three directions.

DETAILED DESCRIPTION

The connections among the cells of a configuration are produced bydefining a boundary around cells and attempting first to connect thecells by connections within the boundary around cells. This isdemonstrated with respect to FIGS. 1 a and b, where the PAE cells of anXPP architecture of the present applicant are shown as elongated andflanked by forward and backward registers labeled as “FR” and “BR,”respectively. A field part is delimited by a dotted line depicting theboundary. A route search will typically progress from the starting cellto the target cell only in the X direction, i.e., horizontally, and thenif no more progress is possible in one row in the X direction, e.g.,because no more suitable buses are available, then the row is switchedto the Y direction. FIG. 1 b shows an example of a possible connectingline when a direct connection is no longer possible between given cells.

It is possible that if all the required connections cannot beestablished within the boundary, a connection may be established outsideof the boundary. If another connection cannot be established as needed,then in both cases, i.e., inside or outside, an existing connectionshould be severed and the additional connection established, whereuponan attempt is made to provide a replacement for the severed connection.However, it may be preferable to wait before going beyond the boundaryuntil it is certain that no additional connections are establishablewithin the boundary even by disconnecting others.

It is possible to provide connections on which a plurality of outputsare combined and are connected to a plurality of inputs, a connectionbeing established in such a way that a path segment separates the inputnodes and the output splits. This is illustrated in FIGS. 2 through 4,which show possible allowed and unallowed connections. FIG. 2 shows ingeneral how data from an output port, i.e., an output terminal,converges at a node (arrow A) and how this may happen at an input of acell (arrow B). FIG. 2 thus shows possibilities for different pathsalong with data may run from object B (cell) at the top right to a lowerobject. The lower object may be, for example, a PAE, an IOPAE, etc.FIGS. 3 and 4 show how data converges in an allowed manner at nodes(FIG. 3) because a single path segment is provided between output splits(outport splits) and input nodes (inport joins) between each route.

It is preferable if, after establishing the connections, the maximumlatency time of the configuration is determined and/or a maximumfrequency corresponding to it for the configuration operation isdetermined. This information may be used to evaluate the quality of theconfiguration result and/or for data processing using the configuration.

It is also preferable if, after determining all signal propagation pathsalong all connections, a propagation-time equalization is performed forsignals converging at nodes. In the applicant's XPP technology, forwhich the present application is particularly preferred, this ispossible by providing register stages which may be inserted into theconnecting pathways in particular in changing the cell row. First, aconnection to the register may be established and then the registernumber required for balancing is determined. This procedure isparticularly advantageous in placement and routing.

In the related art there is still occasionally a problem which it wouldbe advantageous to at least partially relieve in certain situations.Namely, the automatically created placement for feedback, i.e., forprogram loops, for example, in which data from a downstream cell to acell which has previously processed data is frequently so inefficientthat the feedback must go too far, i.e., the feedback bus is too long(Figure xxxf). In other words, the sender and receiver of feedback aretoo far apart. This greatly reduces the processing frequency ofreconfigurable modules.

It is desirable now to create a possibility for improving thearrangement and/or interconnection of cells and/or modules containingcells.

A first approach according to the present invention creates an expedienthere by introducing registers (R) into the long feedback buses atregular intervals (Figure xxxa), resulting in a type of pipelining andincreasing the clock pulse frequency accordingly, because thetransmission times between registers are much shorter than thetransmission time directly from the sender to the receiver. However,this method results in a considerable latency time, which in turngreatly reduces the processing performance, in particular in loops.

For wave reconfiguration, it is also possible to provide particularlyefficient data processing when a second reconfiguration may beconfigured immediately after processing the last data word of a firstconfiguration (i.e., in the same cycle or in a cycle shortly followingthat one) and the first data word of the second configuration isprocessed immediately thereafter (i.e., in the same cycle or in a cyclewhich follows that one shortly).

According to Figures xxua-xxuc, however, the direction of data flow ofeach successive configuration changes. Thus, after each processing of alast data word, the next following configuration may be set immediately,but only after reconfiguration of all the cells and buses involved is itpossible to begin with the next data processing (Figure xxsa). Anapproach according to the present invention thus involves largelymaintaining the direction of flow between the cells and merelyexchanging the bus systems of the memories (Figure xxsb). However, thisagain results in the problem of long run times and low clock pulse ratesas described above in conjunction with feedbacks. Here again, asdescribed already, registers which would result in an increase in clockpulse frequency might be introduced. At the same time, however, thiswould result in a substantial latency time which is in turn undesirable.

In a preferred variant, feedback loops with data streaming throughregisters are therefore avoided.

It has been found that particularly good results may be achieved whenall the cells of a loop are arranged as locally as possible around aloop head (SK), and in particular the loop foot (SF) is placed as closeto the SK as possible (Figure xxxb, xxxd). A helical arrangementresembling the symbol @ (Figure xxxc) is also optimal.

It is therefore proposed that for configuration and/or reconfigurationof a multidimensional field and/or cells for data processing in whichdata is processed in cells, processing results be sent to cellsdownstream to be processed further there, data being sent from at leastone cell downstream to at least one cell upstream, in such a way thatthe cell position is determined so that the downstream cell ispositioned so close to the upstream cell that the feedback time of thisconnection is not longer than that of any other connection in theconfiguration.

This may typically be achieved by arranging the downstream cells closerthan one-fourth of the total data streaming path in the case of theupstream cell.

This may be achieved particularly well when the cells having the densestdata are situated between the upstream end and the downstream end in theform of a coil or in a wave-shaped pattern.

There are various possibilities now for achieving such a feedback loopminimization.

Placements may thus be performed while minimizing virtual forces betweencells and other objects, and then the feedback loop minimization isachieved, for example, by introducing another “virtual” spring forcefrom each element of a loop to the loop head (SK) and/or the loop foot(SF). Alternatively and/or additionally, a virtual force may be providedbetween the loop foot and the loop head. This virtual spring force doesnot represent a bus connection but instead is used only to achieve theplacement arrangement according to the present invention. In particular,the virtual spring force may be different from the spring force of busconnections that actually exist. Other methods of automaticallygenerating the placement arrangement will then be obvious to thoseskilled in the art in accordance with the particular placementprinciple.

For very large loops, the cells of the loop are arranged in awave-shaped pattern around the SK and/or SF (Figure xxxe) or they arewound around the SK and/or SF, but a wave-shaped arrangement ispreferred.

A coil may be achieved by reducing the “virtual” spring forces linearlyor uniformly in steps over the length of the loop. Figure xxxg shows acorresponding example in which the spring forces are reducedincrementally. Coils have the problem that they result in relativelylong buses to the core of the coil (SK, SF).

The preferred wave-shaped arrangement may be achieved by assigningperiodically higher and lower “virtual” spring forces to SK and/or SF tothe particular cells of the loop. For example, such an assignment may bemade by a sine function or a quasi-sine function. Such periodic“virtual” spring forces (0, 1, 2, 3) are shown as an example in Figurexxxe. The periods, i.e., the frequency of the sine function, should bedetermined optimally so that the first cell after the SK and the lastcell before the SF (or the SF itself) have the maximum possible springforce to position them as close together locally as possible. Due to theplacement while defining a virtual winding force, different tasks may beconfigured and/or placed.

Thus, in principle methods may be used which provide for the cellposition in a field having cells of selectable function to be determinedby minimization of virtual forces on the cells, virtual forces differentfrom zero being provided between the upstream cell and the downstreamcell (SF, SK). A memory, in particular a multiport memory, may beprovided in the path between the upstream cell and the downstream cellin particular.

Thus, a corresponding method may now be used for optimization of wavereconfiguration. First, it is stipulated that the memories for readingdata and writing data are not located on the opposite sides of an arrayof cells but instead are situated as close together as possible locallyaccording to SK and SF (Figure xxta). In performing a reconfiguration,only the bus systems between the read/write memories need be exchanged.The buses are therefore only minimally longer, if at all, but this doesnot result in any considerable impairment of the clock pulse frequency(Figure xxtb). Further optimization may be achieved by using the samememories for reading the data (operands) and for writing the results,although different memory banks or different read/write pointers inFIFO-like memories are used, for example, and preferably multiportmemories are used, permitting simultaneous access to multiple ports. Insuch a preferred variant, switching the bus systems is also eliminated,because one and the same memory is used.

Using this principle, the direction of data flow does not change incomparison with the wave reconfiguration running direction, which yieldsoptimum performance.

Within an array, a plurality of these arrangements may be implemented atthe same time. This is shown in Figures xxva-xxvc as an example for tworeconfiguration cycles. Likewise, corresponding arrangements fromseveral sides of the array may also be used at the same time. Figuresxxwa-xxwc show two corresponding reconfiguration cycles as an example.

The method according to Figure xxx is particularly efficient when therequirements of wave reconfiguration are also taken into account in sucha way that SK and/or SF, for example, are to be situated as close aspossible locally to a memory (RAM). This is possible, e.g., by rollingout the loop in only three directions (Figure xxxh), and this is in turnachieved through a suitable periodic buildup of the “virtual” springforces. Depending on whether the spring forces are built up or reduceduniformly, different arrangements may be achieved. The example shown inFigure xxxh uses a uniform linear slow buildup and a rapid linearreduction.

1. A method for configuration for multidimensional fields ofreconfigurable interconnected data processing cells, wherein therequired connections between cells are prioritized, connections having ahigh priority being established first and then additional connectionsbeing established.
 2. The method as recited in the preceding claim,wherein the connections are prioritized by taking into account anacceptable delay in data processing.
 3. The method as recited in one ofthe preceding claims, wherein a boundary is defined around cells and anattempt is made first to connect the cells via connections within theboundary.
 4. The method as recited in the preceding claim, wherein, whenit is impossible to provide all necessary connections within theboundary, a connection is established outside of the boundary.
 5. Themethod as recited in one of the two preceding claims, wherein, when itis impossible to establish an additional connection as necessary, aconnection that has already been established is disconnected and theother connection is established, whereupon an attempt is made to providea replacement for the connection that has been disconnected.
 6. Themethod as recited in one of the preceding claims, wherein connectionsare established on which a plurality of outputs are combined and whichare connected to a plurality of inputs, a connection being establishedin such a way that a spacer separates the input nodes and the outputsplits.
 7. The method as recited in one of the preceding claims,wherein, after establishing the connections, the maximum latency time ofthe configuration is determined and/or a maximum frequency correspondingto it for the configuration operation is determined.
 8. The method asrecited in one of the preceding claims, wherein the prioritization isperformed by taking into account the maximum allowed delay and/or thedelay ratios of different connections.
 9. The method as recited in oneof the preceding claims, wherein the delay relationships inprioritization take into account a delay of “0,” “longer than,” “longerthan or equal to” and “equal to.”
 10. The method as recited in one ofthe preceding claims, wherein, after defining all signal travel pathsalong all connections, a propagation-time equalization is performed forsignals converging at nodes.
 11. A method for configuring and/orreconfiguring a multidimensional field and/or cells for data processingin which data is processed in cells, processing results are sent tocells downstream to be processed further there, data being sent from atleast one cell downstream to at least one cell upstream, wherein thecell position is determined in such a way that the downstream cell ispositioned so close to the upstream cell that the feedback time of thisconnection is no greater than that of any other connections in theconfiguration.
 12. The method as recited in the preceding claim, whereinthe downstream cell is closer than ¼ of the total data-streamed path inthe case of the upstream cell.
 13. The method as recited in one of thepreceding claims, wherein the cells having the densest data are situatedbetween the upstream end and the downstream end in the manner of a coilor in a wave-shaped pattern.
 14. The method as recited in one of thepreceding claims, wherein in a field having cells of a selectablefunction, the cell position is determined by minimization of virtualforces on the cells virtual forces different from zero being providedbetween the upstream cell and the downstream cell (SF, SK).
 15. Themethod as recited in one of the preceding claims, wherein a memory, inparticular a multiport memory, is provided in the path between theupstream cell and the downstream cell.
 16. A method for generatingconfigurations for multidimensional fields of reconfigurable cells forperforming predetermined applications, wherein an application is brokendown into individual modules and the elements necessary for executionare placed module by module.
 17. The method as recited in the precedingclaim, wherein linearization (flattening) is performed in the breakdowninto individual modules.
 18. The method as recited in one of thepreceding claims, wherein stationary elements are provided atpredetermined locations in at least one module and the non-stationaryelements are subsequently placed.
 19. The method as recited in one ofthe three preceding claims, wherein the placement of elements as modulesis made by minimizing assigned virtual forces among the individualmovable and/or immovable objects.